quadratic term
QuadEnhancer: Leveraging Quadratic Transformations to Enhance Deep Neural Networks
Chen, Qian, Yang, Linxin, Wang, Akang, Luo, Xiaodong, Zhang, Yin
The combination of linear transformations and non-linear activation functions forms the foundation of most modern deep neural networks, enabling them to approximate highly complex functions. This paper explores the introduction of quadratic transformations to further increase nonlinearity in neural networks, with the aim of enhancing the performance of existing architectures. To reduce parameter complexity and computational complexity, we propose a lightweight quadratic enhancer that uses low-rankness, weight sharing, and sparsification techniques. For a fixed architecture, the proposed approach introduces quadratic interactions between features at every layer, while only adding negligible amounts of additional model parameters and forward computations. We conduct a set of proof-of-concept experiments for the proposed method across three tasks: image classification, text classification, and fine-tuning large-language models. In all tasks, the proposed approach demonstrates clear and substantial performance gains.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Bregman Alternating Direction Method of Multipliers
The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. BADMM provides a unified framework for ADMM and its variants, including generalized ADMM, inexact ADMM and Bethe ADMM. We establish the global convergence and the O (1 /T) iteration complexity for BADMM. In some cases, BADMM can be faster than ADMM by a factor of O (n/ ln n) where n is the dimensionality. In solving the linear program of mass transportation problem, BADMM leads to massive parallelism and can easily run on GPU. BADMM is several times faster than highly optimized commercial software Gurobi.
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
A quantum annealing approach to graph node embedding
Node embedding is a key technique for representing graph nodes as vectors while preserving structural and relational properties, which enables machine learning tasks like feature extraction, clustering, and classification. While classical methods such as DeepWalk, node2vec, and graph convolutional networks learn node embeddings by capturing structural and relational patterns in graphs, they often require significant computational resources and struggle with scalability on large graphs. Quantum computing provides a promising alternative for graph-based learning by leveraging quantum effects and introducing novel optimization approaches. Variational quantum circuits and quantum kernel methods have been explored for embedding tasks, but their scalability remains limited due to the constraints of noisy intermediate-scale quantum (NISQ) hardware. In this paper, we investigate quantum annealing (QA) as an alternative approach that mitigates key challenges associated with quantum gate-based models. We propose several formulations of the node embedding problem as a quadratic unconstrained binary optimization (QUBO) instance, making it compatible with current quantum annealers such as those developed by D-Wave. We implement our algorithms on a D-Wave quantum annealer and evaluate their performance on graphs with up to 100 nodes and embedding dimensions of up to 5. Our findings indicate that QA is a viable approach for graph-based learning, providing a scalable and efficient alternative to previous quantum embedding techniques.
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
- Europe > Bulgaria > Sofia City Province > Sofia (0.04)
- Energy (0.93)
- Government > Regional Government (0.46)
Nearly Optimal Differentially Private ReLU Regression
Ding, Meng, Lei, Mingxi, Wang, Shaowei, Zheng, Tianhang, Wang, Di, Xu, Jinhui
In this paper, we investigate one of the most fundamental nonconvex learning problems, ReLU regression, in the Differential Privacy (DP) model. Previous studies on private ReLU regression heavily rely on stringent assumptions, such as constant bounded norms for feature vectors and labels. We relax these assumptions to a more standard setting, where data can be i.i.d. sampled from $O(1)$-sub-Gaussian distributions. We first show that when $\varepsilon = \tilde{O}(\sqrt{\frac{1}{N}})$ and there is some public data, it is possible to achieve an upper bound of $\Tilde{O}(\frac{d^2}{N^2 \varepsilon^2})$ for the excess population risk in $(\epsilon, \delta)$-DP, where $d$ is the dimension and $N$ is the number of data samples. Moreover, we relax the requirement of $\epsilon$ and public data by proposing and analyzing a one-pass mini-batch Generalized Linear Model Perceptron algorithm (DP-MBGLMtron). Additionally, using the tracing attack argument technique, we demonstrate that the minimax rate of the estimation error for $(\varepsilon, \delta)$-DP algorithms is lower bounded by $\Omega(\frac{d^2}{N^2 \varepsilon^2})$. This shows that DP-MBGLMtron achieves the optimal utility bound up to logarithmic factors. Experiments further support our theoretical results.
- North America > United States > California (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
Bregman Alternating Direction Method of Multipliers
The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. BADMM provides a unified framework for ADMM and its variants, including generalized ADMM, inexact ADMM and Bethe ADMM. We establish the global convergence and the O(1/T) iteration complexity for BADMM. In some cases, BADMM can be faster than ADMM by a factor of O(n/ ln n) where n is the dimensionality. In solving the linear program of mass transportation problem, BADMM leads to massive parallelism and can easily run on GPU. BADMM is several times faster than highly optimized commercial software Gurobi.
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
Overview: This paper studies the benefits of augmenting the linear programming relaxation of the maximum a-posteriori (MAP) inference problem in graphical models with a quadratic term, thereby achieving strong convexity. Such augmented formulations are obtained both from the original primal and dual formulations, and in each case the resulting primal-dual relationship is studied. Prior work has mostly focused on smoothing the LP formulation using a softmax/entropy term, with a few notable exceptions, such as [5], [17] and [18]. Rather than those previous approaches, which employ a quadratic term in the sub-problems of either a *proximal* or a *alternating direction* scheme, in the present manuscript, the quadratic smoothing term is added directly. This can in some way be seen as a naive approach: In comparison to proximal or alternating direction schemes, convergence to the global optimum of the original problem is no longer guaranteed, and the approximation quality directly depends on the strength of the augmentation term.
QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation
Xu, Chenhui, Wang, Xinyao, Yu, Fuxun, Xiong, Jinjun, Chen, Xiang
Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient and sustainable high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. This integration of pre-trained primary terms with quadratic terms, which possess advanced modeling capabilities, significantly augments the information characterization capacity of the high-order network. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90\% to 98.4\% compared to training from scratch, demonstrating both efficiency and effectiveness.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Liaoning Province > Dalian (0.04)
Bregman Alternating Direction Method of Multipliers
The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. BADMM provides a unified framework for ADMM and its variants, including generalized ADMM, inexact ADMM and Bethe ADMM. We establish the global convergence and the O(1/T) iteration complexity for BADMM. In some cases, BADMM can be faster than ADMM by a factor of O(n/ ln n) where n is the dimensionality. In solving the linear program of mass transportation problem, BADMM leads to massive parallelism and can easily run on GPU. BADMM is several times faster than highly optimized commercial software Gurobi.
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
Energy-Preserving Reduced Operator Inference for Efficient Design and Control
Koike, Tomoki, Qian, Elizabeth
Many-query computations, in which a computational model for an engineering system must be evaluated many times, are crucial in design and control. For systems governed by partial differential equations (PDEs), typical high-fidelity numerical models are high-dimensional and too computationally expensive for the many-query setting. Thus, efficient surrogate models are required to enable low-cost computations in design and control. This work presents a physics-preserving reduced model learning approach that targets PDEs whose quadratic operators preserve energy, such as those arising in governing equations in many fluids problems. The approach is based on the Operator Inference method, which fits reduced model operators to state snapshot and time derivative data in a least-squares sense. However, Operator Inference does not generally learn a reduced quadratic operator with the energy-preserving property of the original PDE. Thus, we propose a new energy-preserving Operator Inference (EP-OpInf) approach, which imposes this structure on the learned reduced model via constrained optimization. Numerical results using the viscous Burgers' and Kuramoto-Sivashinksy equation (KSE) demonstrate that EP-OpInf learns efficient and accurate reduced models that retain this energy-preserving structure.
- North America > United States > New York (0.04)
- Oceania > New Zealand (0.04)
- North America > United States > Illinois (0.04)
- (2 more...)